- Sequencing -> Barracoda pipeline -> our wrangling + visualization
- Aim: To build a pipeline of data wrangling and visualizations after barracoda pipeline to explore sequence hits
2022-05-07
Raw excel-file contains several sheets
setwd("/cloud/project")
data <- read_excel("data/_raw/project_data_raw.xlsx")
data
## # A tibble: 118 × 17 ## barcode sample count.1 input.1 input.2 input.3 log_fold_change p ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 A18B200 BC372 20 221 128 172 -1.20 0.356 ## 2 A19B200 BC372 62 325 203 292 -0.228 0.929 ## 3 A20B200 BC372 20 167 132 155 -1.03 0.475 ## 4 A21B200 BC372 29 260 170 215 -0.981 0.475 ## 5 A22B200 BC372 42 307 185 212 -0.565 0.827 ## 6 A23B200 BC372 48 369 247 291 -0.748 0.662 ## 7 A24B200 BC372 55 327 196 277 -0.361 0.929 ## 8 A25B200 BC372 77 528 348 456 -0.619 0.777 ## 9 A18B201 BC372 27 174 116 132 -0.474 0.929 ## 10 A19B201 BC372 25 158 90 139 -0.447 0.929 ## # … with 108 more rows, and 9 more variables: `-log10(p)` <dbl>, ## # `masked_p (p = 1 if logFC < 0)` <dbl>, `-log10(masked_p)` <dbl>, ## # `count.normalised (edgeR)` <dbl>, `input.normalised (edgeR)` <dbl>, ## # HLA <chr>, Origin <chr>, Peptide <chr>, Sequence <chr>
setwd("/cloud/project")
# Accessing all excel sheets
sheet <- excel_sheets("data/_raw/project_data_raw.xlsx")
# Creating a list of individual data frames for each sheet
data_frame <- lapply(setNames(sheet, sheet),
function(x) read_excel("data/_raw/project_data_raw.xlsx",
sheet = x))
# Attaching individual data frames together
data_frame <- bind_rows(data_frame,
.id = "Sheet")
data_frame
## # A tibble: 1,770 × 18 ## Sheet barcode sample count.1 input.1 input.2 input.3 log_fold_change p ## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 BC372 A18B200 BC372 20 221 128 172 -1.20 0.356 ## 2 BC372 A19B200 BC372 62 325 203 292 -0.228 0.929 ## 3 BC372 A20B200 BC372 20 167 132 155 -1.03 0.475 ## 4 BC372 A21B200 BC372 29 260 170 215 -0.981 0.475 ## 5 BC372 A22B200 BC372 42 307 185 212 -0.565 0.827 ## 6 BC372 A23B200 BC372 48 369 247 291 -0.748 0.662 ## 7 BC372 A24B200 BC372 55 327 196 277 -0.361 0.929 ## 8 BC372 A25B200 BC372 77 528 348 456 -0.619 0.777 ## 9 BC372 A18B201 BC372 27 174 116 132 -0.474 0.929 ## 10 BC372 A19B201 BC372 25 158 90 139 -0.447 0.929 ## # … with 1,760 more rows, and 9 more variables: `-log10(p)` <dbl>, ## # `masked_p (p = 1 if logFC < 0)` <dbl>, `-log10(masked_p)` <dbl>, ## # `count.normalised (edgeR)` <dbl>, `input.normalised (edgeR)` <dbl>, ## # HLA <chr>, Origin <chr>, Peptide <chr>, Sequence <chr>
-The database has a special format
Questions?